The main goal of this notebook is to evaluate the usefulness of seqICP when applied on simulations of a simple VAR(1) process.
First, lets run the example for the package documentation so as to check everything is fine.
Let \(e \in \{a,b\}\) be two different environments governed by the following SCM:
\[ x^a_{1,t} = N_a(0.3, 0.09) \\ x^a_{3,t} = x^a_{1,t} + N_a(0.2, 0.04) \\ y^a_{t} = -0.7x^a_{1,t} + 0.6x^a_{3,t} + N_a(0.1, 0.01) \\ x^a_{2,t} = -0.5*y^a_{t} + 0.5*x^a_{3,t} + N_a(0.1, 0.01) \]
\[ x^b_{1,t} = N_b(0.3, 0.09) \\ x^b_{3,t} = N_b(0.5, 0.25) \\ y^b_{t} = -0.7x^b_{1,t} + 0.6x^b_{3,t} + N_b(0.1, 0.01) \\ x^a_{2,t} = -0.5*y^a_{t} + 0.5*x^a_{3,t} + N_a(0.1, 0.01) \]
Thus, a change has occurred on the distribution of \(x^b_{3,t}\) between environments, whereas the SCM of \(y_t\) remained constant. More precisely, we have:
\[ x^a_{3,t} \sim N(x^a_{1,t} + 0.2, 0.04) \\ x^b_{3,t} \sim N_b(0.5, 0.25) \\ \]
and
\[ y^e_{t} = -0.7x^e_{1,t} + 0.6x^e_{3,t} + N_e(0.1, 0.01) \\ \] regardless of \(e\). Lets sample the above process:
set.seed(1)
# environment 1
na <- 140
X1a <- 0.3*rnorm(na)
X3a <- X1a + 0.2*rnorm(na)
Ya <- -.7*X1a + .6*X3a + 0.1*rnorm(na)
X2a <- -0.5*Ya + 0.5*X3a + 0.1*rnorm(na)
# environment 2
nb <- 80
X1b <- 0.3*rnorm(nb)
X3b <- 0.5*rnorm(nb)
Yb <- -.7*X1b + .6*X3b + 0.1*rnorm(nb)
X2b <- -0.5*Yb + 0.5*X3b + 0.1*rnorm(nb)
# combine environments
X1 <- c(X1a,X1b)
X2 <- c(X2a,X2b)
X3 <- c(X3a,X3b)
Y <- c(Ya,Yb)
Xmatrix <- cbind(X1, X2, X3)
svar_sim <- cbind(Y, Xmatrix)
Here are the autocorrelation functions
for (i in 1:dim(svar_sim)[2]){
print(paste0("x", i))
ggtsdisplay(svar_sim[, i])
}
## [1] "x1"
## [1] "x2"
## [1] "x3"
## [1] "x4"
and the autocorrelation of the square of the series
for (i in 1:dim(svar_sim)[2]){
print(paste0("x", i))
ggtsdisplay(svar_sim[, i]^2)
}
## [1] "x1"
## [1] "x2"
## [1] "x3"
## [1] "x4"
We can note a few interesting observations from the DGP definition and the ACF/PACF.First, from the DGP definition we can conclude that there are only contemporaneous dependencies in the time series. Despite this fact, from a quick inspection of the ACF and PACF’s of the raw time series it seems that the shift on \(x_{3,t}\)’s distribution has introduced some time dependencies on \(y_t\) and itself. For instance, lags two and five for the ACF of \(y_t\) and lag five for the ACF of \(x_{3,t}\) are significant. Third, we can see that shift in \(x_{3,t}\) distribution translates into a shift in \(y_t\)’s variance. From a a visual inspection of the time series of the squares of \(y_t\) and \(x_{3,t}\) we can see that there is a big shift in its variance starting from timestep 130. Furthermore, this shift has introduced arch effects into the time series, which translates into heteroskedasticity of the processes.
From the above observations we can make the following statements about the target \(y_t\):
- It is non-stationary
- Its DGP is invariant across enviroments
- The main shift is in its variance
- There are only comtemporaneous dependencies in the DGP
We now run seqICP on the simulations:
seqICP.result <- seqICP(X = Xmatrix,
Y = Y,
stopIfEmpty=FALSE,
silent=FALSE)
## Currently fitting set S = {}
## p-value: 0.02
## Currently fitting set S = {1}
## p-value: 0.02
## Currently fitting set S = {2}
## p-value: 0.02
## Currently fitting set S = {3}
## p-value: 0.02
## Currently fitting set S = {1, 2}
## p-value: 0.02
## Currently fitting set S = {1, 3}
## p-value: 0.91
## Currently fitting set S = {2, 3}
## p-value: 0.02
## Currently fitting set S = {1, 2, 3}
## p-value: 0.22
summary(seqICP.result)
##
## Invariant Linear Causal Regression at level 0.05
## Variables X1, X3 show a significant causal effect
##
## coefficient lower bound upper bound p-value
## intercept 0.0 -0.05900 0.0179 NA
## X1 -0.7 -0.75200 -0.5292 0.02 *
## X2 0.0 0.00000 0.0000 0.91
## X3 0.6 0.57000 0.7228 0.02 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
seqICP.result <- seqICP(X = Xmatrix,
Y = Y,
model = "ar",
stopIfEmpty=FALSE,
silent=FALSE)
## Currently fitting set S = {}
## Warning in seqICP.s(X, Y, S[[1]], test, par.test, model, par.model): grid
## points larger than 215 are removed, to account for lag p=5
## p-value: 0.02
## Currently fitting set S = {1}
## Warning in seqICP.s(X, Y, S[[ind]], test, par.test, model, par.model): grid
## points larger than 215 are removed, to account for lag p=5
## p-value: 0.02
## Currently fitting set S = {2}
## Warning in seqICP.s(X, Y, S[[ind]], test, par.test, model, par.model): grid
## points larger than 215 are removed, to account for lag p=5
## p-value: 0.02
## Currently fitting set S = {3}
## p-value: 0.02
## Currently fitting set S = {1, 2}
## p-value: 0.02
## Currently fitting set S = {1, 3}
## p-value: 0.97
## Currently fitting set S = {2, 3}
## Warning in seqICP.s(X, Y, S[[ind]], test, par.test, model, par.model): grid
## points larger than 218 are removed, to account for lag p=2
## p-value: 0.08
## Currently fitting set S = {1, 2, 3}
## p-value: 0.2
summary(seqICP.result)
##
## Invariant Linear Causal Regression at level 0.05
## Variable X3[t] shows a significant causal effect
##
## coefficient lower bound upper bound p-value
## intercept -0.01 -0.0710 0.0179 NA
## X1[t] 0.00 0.0000 0.0000 0.079 .
## X2[t] 0.00 0.0000 0.0000 0.970
## X3[t] 0.44 0.5700 0.7817 0.020 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Confidence intervals are only approximate due to post selection (pknown is FALSE).
From the above results we can see that seqICP was able to find the correct set of causal parents even if we didn’t know that no lags were present on the original DGP.
Now we proceed by testing seqICP on a multivariate time series process that has lag dependencies across variables but no clear shift in distribution. Consider the following DGP:
\[ x_{1,t} = 0.2x_{1,t-1} + 0.7x_{3,t-1} + N(0, 1)\\ x_{2,t} = 0.2x_{2,t-1} + N(0, 1)\\\ x_{3,t} = 0.7x_{1,t-1} + 0.2x_{2,t-1} + N(0, 1)\\\ \]
## VAR.sim: simulate VAR as in Enders 2004, p 268
B1 <- matrix(c(0.2, 0, 0.7, 0, 0.2, 0, 0.7, 0.2, 0), nrow = 3, ncol = 3, byrow = TRUE)
var_sim <- VAR.sim(B=B1, n=na+nb, include="none")
Here are the autocorrelation functions
for (i in 1:dim(var_sim)[2]){
print(paste0("x", i))
ggtsdisplay(var_sim[, i])
}
## [1] "x1"
## [1] "x2"
## [1] "x3"
and the autocorrelation of the square of the series
for (i in 1:dim(var_sim)[2]){
print(paste0("x", i))
ggtsdisplay(var_sim[, i]^2)
}
## [1] "x1"
## [1] "x2"
## [1] "x3"
From a visual inspection of the ACFs and PACFs of the raw and squares of
the time series we can notice two main differences. First, all raw time
series have a much stronger time dependence which is induced by the lags
of the time series. Furthermore, there is no clear shift in mean and/or
variance of the series. Therefore we can conclude that the process is
jointly stationary.
var_sim_df <- var_sim %>% as.data.table()
colnames(var_sim_df) <- c("x1", "x2", "x3")
varnames <- colnames(var_sim_df)
Ss <- list()
Sdetails <- list()
for (vn in varnames){
X <- var_sim_df %>% dplyr::select(-one_of(vn))
y <- var_sim_df %>% dplyr::select(one_of(vn))
output <- seqICP(X=X, Y=y, model = "ar",stopIfEmpty = FALSE, silent = TRUE)
Sdetails[[vn]] <- output
print(paste0("Target: ", vn, " Features: ", paste(colnames(X), collapse = " ")))
summary(output)
}
## Warning in seqICP.s(X, Y, S[[1]], test, par.test, model, par.model): grid
## points larger than 219 are removed, to account for lag p=1
## Warning in seqICP.s(X, Y, S[[ind]], test, par.test, model, par.model): grid
## points larger than 219 are removed, to account for lag p=1
## Warning in seqICP.s(X, Y, S[[ind]], test, par.test, model, par.model): grid
## points larger than 219 are removed, to account for lag p=1
## Warning in seqICP.s(X, Y, S[[ind]], test, par.test, model, par.model): grid
## points larger than 219 are removed, to account for lag p=1
## [1] "Target: x1 Features: x2 x3"
##
## Invariant Linear Causal Regression at level 0.05
## No variable shows a significant causal effect
##
## coefficient lower bound upper bound p-value
## intercept -0.03 -0.1878 0.0984 NA
## X1[t] 0.00 0.0000 0.0000 0.95
## X2[t] 0.00 0.0000 0.0000 0.93
## Y0[t-1] 0.20 -0.1345 0.2936 NA
## X1[t-1] 0.00 -0.1141 0.2947 NA
## X2[t-1] 0.67 -0.1636 0.7667 NA
##
## Confidence intervals are only approximate due to post selection (pknown is FALSE).
## Warning in seqICP.s(X, Y, S[[1]], test, par.test, model, par.model): grid
## points larger than 219 are removed, to account for lag p=1
## Warning in seqICP.s(X, Y, S[[1]], test, par.test, model, par.model): grid
## points larger than 219 are removed, to account for lag p=1
## Warning in seqICP.s(X, Y, S[[1]], test, par.test, model, par.model): grid
## points larger than 219 are removed, to account for lag p=1
## Warning in seqICP.s(X, Y, S[[1]], test, par.test, model, par.model): grid
## points larger than 219 are removed, to account for lag p=1
## [1] "Target: x2 Features: x1 x3"
##
## Invariant Linear Causal Regression at level 0.05
## No variable shows a significant causal effect
##
## coefficient lower bound upper bound p-value
## intercept 0.05 -0.0950 0.205 NA
## X1[t] 0.00 0.0000 0.000 1
## X2[t] 0.00 0.0000 0.000 1
## Y0[t-1] 0.10 -0.1738 0.231 NA
## X1[t-1] 0.05 -0.1732 0.245 NA
## X2[t-1] -0.07 -0.1886 0.244 NA
##
## Confidence intervals are only approximate due to post selection (pknown is FALSE).
## Warning in seqICP.s(X, Y, S[[1]], test, par.test, model, par.model): grid
## points larger than 212 are removed, to account for lag p=8
## Warning in seqICP.s(X, Y, S[[ind]], test, par.test, model, par.model): grid
## points larger than 214 are removed, to account for lag p=6
## Warning in seqICP.s(X, Y, S[[ind]], test, par.test, model, par.model): grid
## points larger than 212 are removed, to account for lag p=8
## Warning in seqICP.s(X, Y, S[[ind]], test, par.test, model, par.model): grid
## points larger than 214 are removed, to account for lag p=6
## [1] "Target: x3 Features: x1 x2"
##
## Invariant Linear Causal Regression at level 0.05
## No variable shows a significant causal effect
##
## coefficient lower bound upper bound p-value
## intercept 0.22 0.0430 0.373 NA
## X1[t] 0.00 0.0000 0.000 0.22
## X2[t] 0.00 0.0000 0.000 0.36
## Y0[t-1] 0.01 -0.1340 0.319 NA
## X1[t-1] 0.65 -0.2700 0.794 NA
## X2[t-1] 0.28 -0.2700 0.796 NA
## Y0[t-2] 0.07 -0.1060 0.758 NA
## X1[t-2] -0.06 -0.2250 0.404 NA
## X2[t-2] -0.08 -0.2260 0.220 NA
## Y0[t-3] -0.08 -0.2540 0.142 NA
## X1[t-3] 0.01 -0.2560 0.177 NA
## X2[t-3] -0.03 -0.2570 0.177 NA
## Y0[t-4] -0.15 -0.3200 0.175 NA
## X1[t-4] 0.03 -0.3200 0.201 NA
## X2[t-4] 0.18 -0.2720 0.314 NA
## Y0[t-5] 0.02 -0.1530 0.314 NA
## X1[t-5] 0.02 -0.1530 0.287 NA
## X2[t-5] 0.04 -0.1810 0.191 NA
## Y0[t-6] -0.09 -0.2600 0.174 NA
## X1[t-6] 0.09 -0.2600 0.261 NA
## X2[t-6] -0.08 -0.2090 0.261 NA
## Y0[t-7] -0.09 -0.2540 0.209 NA
## X1[t-7] 0.17 -0.2540 0.344 NA
## X2[t-7] 0.03 -0.1050 0.344 NA
## Y0[t-8] -0.10 -0.2440 0.160 NA
## X1[t-8] 0.03 -0.2440 0.168 NA
## X2[t-8] 0.10 -0.1040 0.227 NA
##
## Confidence intervals are only approximate due to post selection (pknown is FALSE).
From the above results we can see that seqICP was not able to find the correct set of causal parents for any of the equations. This result actually makes sense since our original DGP is stationary, and thus violates one of the assumptions of seqICP.
To solve for the stationarity issue we had before, we will try to artificially introduce a variance shift into the VAR(1) process we defined before.
To do so, recall that \(e \in \{a,b\}\) are two different environments. We define the following DGP for the variance shifted VAR(1) process:
\[ x^a_{1,t} = 0.2x^a_{1,t-1} + 0.7x^a_{3,t-1} + N(0, 1)\\ x^a_{2,t} = 0.2x^a_{2,t-1} + N(0, 1)\\ x^a_{3,t} = 0.7x^a_{1,t-1} + 0.2x^a_{2,t-1} + N(0, 1)\\ \]
\[ x^b_{1,t} = 0.2x^b_{1,t-1} + 0.7x^b_{3,t-1} + N(0, 1)\\ x^b_{2,t} = 0.2x^b_{2,t-1} + N(0, 1)\\ x^b_{3,t} = 0.7x^b_{1,t-1} + 0.2x^a_{2,t-1} + N(0, 3)\\ \]
Thus, the SCM of \(x_{1,t}\) is the same regardless of \(e\) but there is a variance shift between the environments caused by a change in distribution in \(x_{3,t-1}\).
Below we will sample from the above model:
set.seed(1)
n <- na + nb
x1 <- matrix(data = NA, nrow = n, ncol = 1)
x2 <- matrix(data = NA, nrow = n, ncol = 1)
x3 <- matrix(data = NA, nrow = n, ncol = 1)
for (i in 1:n){
# initialize all series
if (i == 1){
x1[i,] <- rnorm(n = 1, mean = 0, sd = 1)
x2[i,] <- rnorm(n = 1, mean = 0, sd = 1)
x3[i,] <- rnorm(n = 1, mean = 0, sd = 1)
next
}
# sample time series iteratively conditioned on each environment
if (i <= na){
x1[i,] <- 0.2*x1[i-1,] + 0.7*x3[i-1,] + rnorm(n = 1, mean = 0, sd = 1)
x2[i,] <- 0.2*x2[i-1,] + rnorm(n = 1, mean = 0, sd = 1)
x3[i,] <- 0.7*x1[i-1,] + 0.2*x2[i-1,] + rnorm(n = 1, mean = 0, sd = 1)
}
else{
x1[i,] <- 0.2*x1[i-1,] + 0.7*x3[i-1,] + rnorm(n = 1, mean = 0, sd = 1)
x2[i,] <- 0.2*x2[i-1,] + rnorm(n = 1, mean = 0, sd = 1)
x3[i,] <- 0.7*x1[i-1,] + 0.2*x2[i-1,] + rnorm(n = 1, mean = 0, sd = 3)
}
}
var_sim_shifted <- cbind(x1, x2, x3)
ts.plot(var_sim_shifted, type="l", col=c(1, 2, 3))
Here are the autocorrelation functions of the raw series
for (i in 1:dim(var_sim_shifted)[2]){
print(paste0("x", i))
ggtsdisplay(var_sim_shifted[, i])
}
## [1] "x1"
## [1] "x2"
## [1] "x3"
and the autocorrelation of the square of the series
for (i in 1:dim(var_sim_shifted)[2]){
print(paste0("x", i))
ggtsdisplay(var_sim_shifted[, i]^2)
}
## [1] "x1"
## [1] "x2"
## [1] "x3"
From the above ACF and PACFs we can see that the stationarity property was broken by introducing the shift in variance on \(x_{3,t}\).
Finally, we run seqICP on the new dataset:
var_sim_shifted_df <- var_sim_shifted %>% as.data.table()
colnames(var_sim_shifted_df) <- c("x1", "x2", "x3")
varnames <- colnames(var_sim_shifted_df)
Ss <- list()
Sdetails <- list()
for (vn in varnames){
X <- var_sim_shifted_df %>% dplyr::select(-one_of(vn))
y <- var_sim_shifted_df %>% dplyr::select(one_of(vn))
output <- seqICP(X=X, Y=y, model = "ar", stopIfEmpty = FALSE, silent = TRUE, )
Sdetails[[vn]] <- output
print(paste0("Target: ", vn, " Features: ", paste(colnames(X), collapse = " ")))
summary(output)
}
## Warning in seqICP.s(X, Y, S[[1]], test, par.test, model, par.model): grid
## points larger than 214 are removed, to account for lag p=6
## Warning in seqICP.s(X, Y, S[[ind]], test, par.test, model, par.model): grid
## points larger than 214 are removed, to account for lag p=6
## Warning in seqICP.s(X, Y, S[[ind]], test, par.test, model, par.model): grid
## points larger than 214 are removed, to account for lag p=6
## Warning in seqICP.s(X, Y, S[[ind]], test, par.test, model, par.model): grid
## points larger than 214 are removed, to account for lag p=6
## [1] "Target: x1 Features: x2 x3"
##
## Invariant Linear Causal Regression at level 0.05
## No variable shows a significant causal effect
##
## coefficient lower bound upper bound p-value
## intercept 0.02 -0.1188 0.1649 NA
## X1[t] 0.00 0.0000 0.0000 0.65
## X2[t] 0.00 0.0000 0.0000 0.61
## Y0[t-1] 0.11 -0.2287 0.2470 NA
## X1[t-1] -0.02 -0.1682 0.2860 NA
## X2[t-1] 0.70 -0.1662 0.7615 NA
## Y0[t-2] -0.03 -0.1805 0.7631 NA
## X1[t-2] 0.06 -0.1863 0.7603 NA
## X2[t-2] 0.05 -0.1885 0.2245 NA
## Y0[t-3] 0.10 -0.0736 0.2457 NA
## X1[t-3] -0.14 -0.2907 0.2505 NA
## X2[t-3] 0.03 -0.2922 0.2509 NA
## Y0[t-4] -0.07 -0.2864 0.1468 NA
## X1[t-4] 0.03 -0.2198 0.1788 NA
## X2[t-4] -0.09 -0.2183 0.1845 NA
## Y0[t-5] 0.17 -0.2077 0.3142 NA
## X1[t-5] 0.04 -0.2091 0.3244 NA
## X2[t-5] 0.06 -0.1156 0.3317 NA
## Y0[t-6] -0.11 -0.1926 0.1925 NA
## X1[t-6] -0.07 -0.2200 0.1686 NA
## X2[t-6] -0.12 -0.2341 0.0738 NA
##
## Confidence intervals are only approximate due to post selection (pknown is FALSE).
## Warning in seqICP.s(X, Y, S[[1]], test, par.test, model, par.model): grid
## points larger than 212 are removed, to account for lag p=8
## Warning in seqICP.s(X, Y, S[[1]], test, par.test, model, par.model): grid
## points larger than 214 are removed, to account for lag p=6
## Warning in seqICP.s(X, Y, S[[1]], test, par.test, model, par.model): grid
## points larger than 214 are removed, to account for lag p=6
## Warning in seqICP.s(X, Y, S[[1]], test, par.test, model, par.model): grid
## points larger than 214 are removed, to account for lag p=6
## [1] "Target: x2 Features: x1 x3"
##
## Invariant Linear Causal Regression at level 0.05
## No variable shows a significant causal effect
##
## coefficient lower bound upper bound p-value
## intercept 0.01 -0.12460 0.1566 NA
## X1[t] 0.00 0.00000 0.0000 0.12
## X2[t] 0.00 0.00000 0.0000 0.12
## Y0[t-1] 0.29 -0.21840 0.4323 NA
## X1[t-1] 0.05 -0.08990 0.4125 NA
## X2[t-1] 0.02 -0.08020 0.4106 NA
## Y0[t-2] 0.00 -0.14880 0.2193 NA
## X1[t-2] -0.01 -0.15740 0.1879 NA
## X2[t-2] -0.06 -0.17780 0.1481 NA
## Y0[t-3] -0.02 -0.17670 0.1332 NA
## X1[t-3] 0.05 -0.17300 0.1938 NA
## X2[t-3] 0.06 -0.17480 0.2019 NA
## Y0[t-4] 0.06 -0.08920 0.2104 NA
## X1[t-4] 0.03 -0.11910 0.2186 NA
## X2[t-4] -0.02 -0.13240 0.2212 NA
## Y0[t-5] 0.04 -0.13460 0.1880 NA
## X1[t-5] 0.09 -0.14260 0.2394 NA
## X2[t-5] -0.07 -0.18010 0.2263 NA
## Y0[t-6] -0.15 -0.30060 0.2425 NA
## X1[t-6] 0.04 -0.29990 0.1867 NA
## X2[t-6] -0.09 -0.30610 0.0656 NA
## Y0[t-7] 0.03 -0.15930 0.1805 NA
## X1[t-7] -0.04 -0.18770 0.1087 NA
## X2[t-7] -0.05 -0.16000 0.0690 NA
## Y0[t-8] 0.08 -0.07200 0.2245 NA
## X1[t-8] -0.02 -0.10330 0.0688 NA
## X2[t-8] 0.08 -0.02830 0.1888 NA
##
## Confidence intervals are only approximate due to post selection (pknown is FALSE).
## Warning in seqICP.s(X, Y, S[[1]], test, par.test, model, par.model): grid
## points larger than 210 are removed, to account for lag p=10
## Warning in seqICP.s(X, Y, S[[ind]], test, par.test, model, par.model): grid
## points larger than 210 are removed, to account for lag p=10
## Warning in seqICP.s(X, Y, S[[ind]], test, par.test, model, par.model): grid
## points larger than 210 are removed, to account for lag p=10
## Warning in seqICP.s(X, Y, S[[ind]], test, par.test, model, par.model): grid
## points larger than 210 are removed, to account for lag p=10
## [1] "Target: x3 Features: x1 x2"
##
## Invariant Linear Causal Regression at level 0.05
## Model has been rejected at the chosen level 0.05, that is no subset of variables leads to invariance across time. This can be for example due to presence of
## (a) non-linearities or
## (b) hidden variables or
## (c) interventions on the target variable.
##
## Confidence intervals are only approximate due to post selection (pknown is FALSE).
As we can see, seqICP was not able to identify the invariant set correctly for the variance shifted VAR(1) process.
We hypothesize that the reason why seqICP was not able to find the correct set of causal parents is because the original DGP must have at least one contemporaneous relationship. This scenario is precisely the context of structural vector autoregressive models (SVAR).
In this context, we will try to artificially introduce a variance shift into the SVAR(1) process we defined before.
To do so, recall that \(e \in \{a,b\}\) are two different environments. We define the following DGP for the variance shifted SVAR(1) process:
\[ x^a_{1,t} = 0.7x^a_{3,t} + N(0, 1)\\ x^a_{2,t} = 0.2x^a_{2,t-1} + N(0, 1)\\ x^a_{3,t} = 0.2x^a_{2,t} + N(0, 1)\\ \]
\[ x^b_{1,t} = 0.7x^b_{3,t} + N(0, 1)\\ x^b_{2,t} = 0.2x^b_{2,t-1} + N(0, 1)\\ x^b_{3,t} = 0.2x^a_{2,t} + N(0, 3)\\ \]
It can be seen by the above equations that \(x_{1,t}\) depends on \(x_{3,t}\). Below we simulate from this process:
set.seed(1)
n <- na + nb
x1 <- matrix(data = NA, nrow = n, ncol = 1)
x2 <- matrix(data = NA, nrow = n, ncol = 1)
x3 <- matrix(data = NA, nrow = n, ncol = 1)
for (i in 1:n){
# initialize all series
if (i == 1){
x1[i,] <- rnorm(n = 1, mean = 0, sd = 1)
x2[i,] <- rnorm(n = 1, mean = 0, sd = 1)
x3[i,] <- rnorm(n = 1, mean = 0, sd = 1)
next
}
# sample time series iteratively conditioned on each environment
if (i <= na){
x2[i,] <- 0.2*x2[i-1,] + rnorm(n = 1, mean = 0, sd = 1)
x3[i,] <- 0.2*x2[i,] + rnorm(n = 1, mean = 0, sd = 1)
x1[i,] <- 0.7*x3[i,] + rnorm(n = 1, mean = 0, sd = 1)
}
else{
x2[i,] <- 0.2*x2[i-1,] + rnorm(n = 1, mean = 0, sd = 1)
x3[i,] <- 0.2*x2[i,] + rnorm(n = 1, mean = 0, sd = 3)
x1[i,] <- 0.7*x3[i,] + rnorm(n = 1, mean = 0, sd = 1)
}
}
var_sim_shifted2 <- cbind(x1, x2, x3)
ts.plot(var_sim_shifted2, type="l", col=c(1, 2, 3))
Here are the autocorrelation functions of the raw series
for (i in 1:dim(var_sim_shifted2)[2]){
print(paste0("x", i))
ggtsdisplay(var_sim_shifted2[, i])
}
## [1] "x1"
## [1] "x2"
## [1] "x3"
and the autocorrelation of the square of the series
for (i in 1:dim(var_sim_shifted2)[2]){
print(paste0("x", i))
ggtsdisplay(var_sim_shifted2[, i]^2)
}
## [1] "x1"
## [1] "x2"
## [1] "x3"
Finally, we run seqICP on the new dataset:
var_sim_shifted_df2 <- var_sim_shifted2 %>% as.data.table()
colnames(var_sim_shifted_df2) <- c("x1", "x2", "x3")
varnames <- colnames(var_sim_shifted_df2)
Ss <- list()
Sdetails <- list()
for (vn in varnames){
X <- var_sim_shifted_df2 %>% dplyr::select(-one_of(vn))
y <- var_sim_shifted_df2 %>% dplyr::select(one_of(vn))
output <- seqICP(X=X, Y=y, model = "ar", stopIfEmpty = FALSE, silent = TRUE, )
Sdetails[[vn]] <- output
print(paste0("Target: ", vn, " Features: ", paste(colnames(X), collapse = " ")))
summary(output)
}
## Warning in seqICP.s(X, Y, S[[1]], test, par.test, model, par.model): grid
## points larger than 210 are removed, to account for lag p=10
## Warning in seqICP.s(X, Y, S[[ind]], test, par.test, model, par.model): grid
## points larger than 210 are removed, to account for lag p=10
## Warning in seqICP.s(X, Y, S[[ind]], test, par.test, model, par.model): grid
## points larger than 212 are removed, to account for lag p=8
## Warning in seqICP.s(X, Y, S[[ind]], test, par.test, model, par.model): grid
## points larger than 212 are removed, to account for lag p=8
## [1] "Target: x1 Features: x2 x3"
##
## Invariant Linear Causal Regression at level 0.05
## Model has been rejected at the chosen level 0.05, that is no subset of variables leads to invariance across time. This can be for example due to presence of
## (a) non-linearities or
## (b) hidden variables or
## (c) interventions on the target variable.
##
## Confidence intervals are only approximate due to post selection (pknown is FALSE).
##
## [1] "Target: x2 Features: x1 x3"
##
## Invariant Linear Causal Regression at level 0.05
## No variable shows a significant causal effect
##
## coefficient lower bound upper bound p-value
## intercept 0.03 -0.1069 0.17 NA
## X1[t] 0.00 0.0000 0.00 0.36
## X2[t] 0.00 0.0000 0.00 0.51
##
## Confidence intervals are only approximate due to post selection (pknown is FALSE).
## Warning in seqICP.s(X, Y, S[[1]], test, par.test, model, par.model): grid
## points larger than 210 are removed, to account for lag p=10
## Warning in seqICP.s(X, Y, S[[ind]], test, par.test, model, par.model): grid
## points larger than 210 are removed, to account for lag p=10
## [1] "Target: x3 Features: x1 x2"
##
## Invariant Linear Causal Regression at level 0.05
## Model has been rejected at the chosen level 0.05, that is no subset of variables leads to invariance across time. This can be for example due to presence of
## (a) non-linearities or
## (b) hidden variables or
## (c) interventions on the target variable.
##
## Confidence intervals are only approximate due to post selection (pknown is FALSE).